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ABSTRACT. The threshold degree of a function / : {0, 1}" -> { — 1, + 1} is the least degree 
of a real polynomial p with f(x) = sgnp(jc). We prove that the intersection of two half- 
spaces on {0, 1}" has threshold degree £2(n), which matches the trivial upper bound and 
completely answers a question due to Klivans (2002). The best previous lower bound was 
Q.(\/n). Our result shows that the intersection of two halfspaces on {0, 1}" only admits a 
trivial 2®'"' -time learning algorithm based on sign-representation by polynomials, unlike 
the advances achieved in PAC learning DNF formulas and read-once Boolean formulas. 
The proof introduces a new technique of independent interest, based on Fourier analysis 
and matrix theory. 



1. Introduction 

A well-studied notion in computational learning theory is that of a perceptron. This 
term stands for the representation of a given Boolean function /: {0, 1 }"—>{ — 1 , +1 } in 
the form f(x) = sgn p(x) for a real polynomial p of some degree d. The least degree d for 
which / admits such a representation is called the threshold degree of /, denoted deg ± (/) . 
In other words, deg ± (/) is the least degree of a real polynomial that agrees with / in sign. 
Perceptrons are appealing from a learning standpoint because they immediately lead to 
efficient learning algorithms. In more detail, let /: {0, 1}" — » {— 1,+1} be an unknown 
function of threshold degree d. Then by definition, / has a representation of the form 

/0) = sgn £ lsY\xi 

\\S\^d ies J 

for some reals Xs and is thus a half space in N — (o) + (") + \~ 0) dimensions. As a 

result, / can be PAC learned in time polynomial in N, using any of a variety of halfspace 
learning algorithms. (Throughout this paper, the term "PAC learning" refers to Valiant's 
standard model [40] of learning under arbitrary distributions.) 

The study of perceptrons dates back forty years to the seminal monograph of Minsky 
and Papert |25], who examined the threshold degree of several common functions. Today, 
the perceptron-based approach yields the fastest known PAC learning algorithms for sev- 
eral concept classes. One such is the class of DNF formulas of polynomial size, posed a 
challenge in Valiant's original paper [40] and extensively studied over the past two decades. 
The fastest known algorithm for PAC learning DNF formulas runs in time exp{(5(n 1 / 3 )} 
and is due to Klivans and Servedio 1 18]. Specifically, the authors of [18] prove an upper 
bound of 0{n l l^\ogn) on the threshold degree of polynomial-size DNF formulas, which 
essentially matches a classical lower bound of £l(n'/ 3 ) due to Minsky and Papert ll25ll . 
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Another success story of the perceptron-based approach is the concept class of Boolean 
formulas, i.e., Boolean circuits with fan-out 1 at every gate. O'Donnell and Servedio ||29l 
proved an upper bound of y^ilog ^* on the threshold degree of Boolean formulas of 
size s and depth d, giving the first subexponential algorithm for a family of formulas of 
superconstant depth. This upper bound on the threshold degree was improved to s °- 5 +°( 1 ) 
for any depth d by Ambainis et al. [2|, building on a quantum query algorithm of Farhi et 
al. iflOl . More recently, Lee E4ll sharpened the upper bound to <9(-y/s), which is tight. This 
line of research gives the fastest known algorithm for PAC learning Boolean formulas. 

Another extensively studied problem in computational learning theory, and the subject 
of this paper, is the problem of learning intersections of half spaces, i.e., conjunctions of 
functions of the form f(x) = sgn(£ apci — 9) for some reals a,\ , . . . , a„, 9. While solutions 
are known to several restrictions of this problem Q[23[4T][3][l2l[l9][l6], no algorithm has 
been discovered for PAC learning the intersection of even two halfspaces in time faster than 
2 ®(»). p ro gress on proving hardness results has also been scarce. Indeed, all known hard- 
ness results |H] Q] EQl Q3) either require polynomially many halfspaces or assume proper 
learning. In particular, we are not aware of any representation-independent hardness results 
for PAC learning the intersection of 0(1) halfspaces. 



Our Results. Since the perceptron-based approach yields the fastest known algorithms 
for PAC learning DNF formulas and read-once Boolean formulas, it is natural to wonder 
whether it can yield any nontrivial results for the intersection of two halfspaces. Letting 
D(n) stand for the maximum threshold degree over all intersections of two halfspaces 
on {0, 1}", the question becomes whether D(n) is a nontrivial (sublinear) function of the 
dimension n. This question has been studied by several authors, as summarized in TableQ] 
Forty years ago, Minsky and Papert |25| used a compactness argument to show that D(n) = 
co(l), the function in question being the intersection of two majorities on disjoint sets 
variables. O'Donnell and Servedio [29 1 studied the same function using a rather different 
approach and thereby proved that D(n) — £2(log«/loglogn). No nontrivial upper bounds 
on D(n) being known, Klivans lfT31 §7] formally posed the problem of proving a lower 
bound substantially better than £2(logn) or an upper bound of o(n). 

It was recently shown in [34| that D(n) = Q.(^/n), solving Klivans' problem and ruling 
out an n^^-time PAC learning algorithm based on perceptions. It is clear, however, that a 
PAC learning algorithm for the intersection of two halfspaces in time n®(V") would still be 
a breakthrough in computational learning theory, comparable to the advances in the study 
of DNF formulas and read-once Boolean formulas. The main contribution of this paper is 
to prove that D(n) = £2(n) , which matches the trivial upper bound and definitively rules out 
the perceptron-based approach for learning the intersection of two halfspaces in nontrivial 
time. 



Result Reference 
D{n) = ffl(l) H 
D(n) = £2 (log n/ log log 71 ) 
D(n) = £2(» 
D(n) — ©(h) this paper 



Table 1: Lower bounds for the intersection of two halfspaces. 
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THEOREM 1 (Main result). For n = 1,2,3, ... , let D{n) denote the maximum threshold de- 
gree of a function of the form f(x)Ag(x), where f,g: {0, 1}"— >{ — 1, +1} are halfspaces. 
Then 



To be more precise, we give a randomized algorithm which with probability at least 
1 — e~"/ 12 constructs two halfspaces on {0,1}" whose intersection has threshold degree 
©(«). In Section|6] we develop several refinements of TheoremQ] For example, we show 
that the intersection of two halfspaces on {0, 1}" requires a perceptron with exp{0(n)} 
monomials, i.e., does not have a sparse sign-representation. We also give an essentially 
tight lower bound on the threshold degree of the intersection of a halfspace and a majority 
function, improving quadratically on the previous bound in ll34l . 

In summary, unlike DNF formulas and read-once Boolean formulas, the intersection of 
two halfspaces does not admit a nontrivial sign-representation. Apart from computational 
learning theory, lower bounds on the threshold degree have played a key role in several 
works on circuit complexity [30 39] EH Ell [36 1, Turing complexity classes HIDE], an d 
communication complexity [36 35]|37]|3T). For this reason, we consider TheoremQ] and 
the techniques used to obtain it to be of interest outside of computational learning. 

Theorem Q] and much previous work suggest that the nature of a PAC learning prob- 
lem changes significantly when, instead of Valiant's original arbitrary-distribution setting, 
one considers learning with respect to restricted distributions. For example, the uniform 
distribution on the sphere S" or hypercube {0,1}" allows the use of tools other than 
sign-representing polynomials, such as Fourier analysis. In particular, polynomial-time 
algorithms are known for the uniform-distribution learning of intersections of a constant 
number of halfspaces on the sphere [|7]|4T| and hypercube [17]. Furthermore, if member- 
ship queries are allowed, DNF formulas are known to be learnable in polynomial time with 
respect to the uniform distribution on the hypercube lfl2l . 

Our Techniques. Let / A/ denote the conjunction of two copies of a given Boolean func- 
tion /, each on an independent set of variables. It was shown in ll34l that the threshold 
degree of / A / equals, up to a small multiplicative constant, the least degree of a ra- 
tional function R with \\f — R\\co ^ 1/3. With this characterization in hand, the equality 
deg ± (/A/) = ®{y/n) was derived in ll34l by solving the rational approximation problem 
for the halfspace 



Unfortunately, the ®(y/n) barrier is fundamental to the analysis in 11341 . To prove that in 
fact D(n) = ®(n), we pursue a rather different approach. 

The intuition behind our work is as follows. Let a,\ , OC2, . . . , CC„ be given nonzero in- 
tegers, and let /: {0,1}" — > { — 1,+1} be a given Boolean function such that f(x) is 
completely determined by the sum £a,x,. When approximating / pointwise by polyno- 
mials and rational functions of a given degree, can one restrict attention to those approx- 
imants that are, like /, functions of the sum £ Ofpc; alone rather than the individual bits 
Xi,X2, ■ ■ ■ iX n l If true, this claim would dramatically simplify the analysis of the threshold 
degree of / by reducing it to a univariate question. Minsky and Papert ||251 showed that 
the claim is indeed true in the highly special case a,\ = = ■ ■ ■ = a„ . For the purposes of 



£>(«)=©(«). 
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this paper, however, the nonzero coefficients (X\ , a-i , . . . , OC„ must be of increasing orders of 
magnitude and in particular must satisfy 



max 

i-j 



> exp{i2(n)}. 



Minsky and Papert's argument breaks down completely in this setting, and with good rea- 
son: coefficients 0Ci,... ,a„ are easily constructed [5 1 for which the passage to univariate 
approximation increases the degree requirement from 1 to n. 

To overcome this difficulty, we use techniques from Fourier analysis and matrix pertur- 
bation theory. Specifically, we define an appropriate distribution on n-tuples [a,\ ,...,a„) 
and study the behavior of the sum as the vector x ranges over {0, 1}". We prove that 

for a typical n-tuple [a,\ , . . . , a„) and any collection of sums S C Z of interest, the subset 
Xg C {0, 1}" that induces the sums in S is highly random in that membership in X$ is uncor- 
rected with any polynomial of degree up to ©(«). With some additional work, this allows 
the sought passage to a univariate question. In particular, we are able to prove the existence 
of a halfspace / : {0, 1 }" — > { — 1 , + 1 } such that any multivariate rational approximant for / 
gives a univariate rational approximant for the sign function on {±1 , ±2, ±3, . . . , ±2®M} 
with the same degree and error. The univariate question being well-understood, we infer 
that / requires a rational function of degree £l(n) for pointwise approximation within 1 /3 
and hence deg ± (/A/) ^ £2(n) by the characterization from (34). 

2. Preliminaries 

Notation. We will view Boolean functions as mappings X — >• {0, 1 } or X { — 1 , + 1 } for 
some finite set X, where the output value 1 corresponds to "true" in the former case and 
"false" in the latter. We adopt the following standard definition of the sign function: 



sgnx : 



-1, x<0, 

0, x = 0, 

1, x>0. 



The complement of a set S is denoted S. We denote the symmetric difference of sets S and 
T by S(B T = (SHT) U (Sn T). For a finite set X, the symbol &>(X) denotes the family of 
all 2l x l subsets of X. For functions f,g: X->lona finite set X, we use the notation 

</.*> = m !/(*)*(*)■ 

I A I xex 

We let log* stand for the logarithm of x to the base 2. The binary entropy function 
H : [0, 1] — > [0, 1] is given by H(p) — — plogp— (1 — j>)log(l —p) and is strictly increasing 
on [0, 1 /2] . The following bound is well known [13, p. 283]: 



(2.1) £ f a* 1 ", fc = 0,1,2,..., L«/2J. 

i=o V 1 J 

For elements x,y of a given set, we use the Kronecker delta 



Sx,y = 



1, x = y, 
0, x^y. 
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The symbol P% stands for the family of all univariate real polynomials of degree up to k. 
The majority function MAJ„ : {0, 1}" — > {— 1 , +1} has the usual definition: 



MAJ„(x) 



*\ +*2 H \-x„ > n/2, 

otherwise. 



Fourier transform. Consider the vector space of functions {0, 1}" — >• R, equipped with 
the inner product 

(f,g)=2-» £ f(x)g(x). 
xe{o.iy 

For S C {1,2,. ..,«}, define X s- {0,1}" -> {-1,+1} by X s(x) = (-1)^* Then 
{Xs}sc{i,2,...,n} is an orthonormal basis for the inner product space in question. As a result, 
every function /: {0, 1}" — > K has a unique representation of the form 

/= I fWZs, 

SC{l,2,...,n} 

where f(S) = (f,Xs)- The reals f(S) are called the Fourier coefficients off. The orthonor- 
mality of {xs} immediately yields Parseval's identity: 

(2-2) E f(S) 2 = {f,f)= E [/(*) 2 ]. 

sc{i, 2 ,...,4 ^i - 1 }" 



Matrices. The symbol M. mxn refers to the family of all mxn matrices with real entries. A 
matrix A e M" x " is called strictly diagonally dominant if 



\Aii\>Z\ A ij\ 



i = 1,2, . . . 



A well-known result in matrix perturbation theory, due to Gershgorin ifTTI . states that the 
eigenvalues of a matrix lie in the union of certain disks in the complex plane centered 
around the diagonal entries of the matrix. We will need the following very special case, 
which corresponds to showing that the eigenvalues are all nonzero. 



Theorem 2.1 (Gershgorin). Let A e 
nonsingular. 



be strictly diagonally dominant. Then A is 



Proof (Gershgorin). Fix a nonzero vector x 6 W and choose i such that | = |jx|| 
by strict diagonal dominance, 



Then 



l(A*),'| = 



>\Au\\\4~-1L\Aij\\\x\\~>0, 



so that Ax ^ 0. 
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Rational approximation. The degree of a rational function p(x)/q(x), where p and q 
are polynomials on R", is the maximum of the degrees of p and q. Consider a function 
/: X-)- {-1,+1}, where X C R". For d ^ 0, define 

R(f,d) = inf sup 

where the infimum is over multivariate polynomials p and q of degree up to d such that 
q does not vanish on X. In words, R(f,d) is the least error in an approximation of / by a 
multivariate rational function of degree up to d. A closely related quantity is 

R + {f,d) = inf sup 

p,q xEX 

where the infimum is over multivariate polynomials p and q of degree up to d such that q 
is positive on X. These two quantities are related in a straightforward way: 

R+(f,2d)^R(f,d)^R + (f,d). 

The second inequality here is trivial. The first follows from the fact that every rational 
approximant p (x) / q(x) of degree d gives rise to a degree-2a! rational approximant with the 
same error and a positive denominator, namely, {p(x)q(x)} /q(x) 2 . 

The infimum in the definitions of /?(/, d) and R + (/, d) cannot in general be replaced by 
a minimum [32 1, even when X is finite subset of R. This contrasts with the more familiar 
setting of a finite-dimensional normed linear space, where least-error approximants are 
guaranteed to exist. 

For S C R, we let 

R + (S,d) = inf sup 

p,q xeS 

where the infimum ranges over p,q GPj such that q is positive on S. The study of the ratio- 
nal approximation of the sign function dates back to seminal work by Zolotarev [42 J in the 
late 19th century. A much later result due to Newman |28 1 gives highly accurate estimates 
of R + ([— n, —1] U [l,n],d) for all n and d. Newman's work in particular provides upper 
bounds on 2? + ({±l,±2,...,±n},d), which in ll34l were sharpened and complemented 
with matching lower bounds to the following effect: 

THEOREM 2.2 (Sherstov). Letn,d be positive integers, R=R + ({±1,±2, . . . : ±n},d).For 
1 ^ d ^ logrc, 

ex p{-©(^))}^^< ex p{-^A7}- 

For logn < d < n, 

x = «"{- @ (i5i»))}' 

For d^n, 

R = 0. 

Theorem [372] has the following corollary ll34l Thm. 1.7], in which we adopt the notation 

rdeg e (f)=mm{d:R + (f,d)^e}. 



q(x 



sgnx- 



p(x) 



q{x) 
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THEOREM 2.3 (Sherstov). Let MAJ„: {0, 1}" -> {-1,+1} denote the majority function. 
Then 



rdeg E (MAJ„) = < 



( 1( ^ioi^)}- 1Og i)' 2«- f .:./3. 



1 



logn 



io g {i/(i 



1/3 < e < 1. 



Threshold degree. Let / : X — s- {— 1,+1} be a given Boolean function, where Xcl" 
is finite. The threshold degree of /, denoted deg ± (/), is the least degree of a polyno- 
mial p(x) such that f(x) = sgnp(x). The term "threshold degree" appears to be due to 
Saks ||33l . Equivalent terms in the literature include "strong degree" @), "voting polyno- 
mial degree" ED . "polynomial threshold function degree" [29], and "sign degree" l9l . 

Given functions /: X— >•{ — 1,+1} andg: Y {— 1,+1}, we let the symbol fAg stand 
for the function X X Y -> {— 1,+1} given by (f Ag)(x,y) — f(x)Ag(y). Note that in this 
notation, / and / A / are completely different functions, the former having domain X and 
the latter X x X. An elegant observation, due to Beigel et al. (6|, relates the notions of 
sign-representation and rational approximation for conjunctions of Boolean functions. 

Theorem 2.4 (Beigel, Reingold, and Spielman). Let f: X -t {— 1,-1-1} and g: Y ->• 
{ — 1,+1} be given functions, where X,Y C W. Let d be an integer with R + (f,d) + 
R + (g,d) < 1. Then 

deg ± (/Ag)<2<*. 

Proof (Beigel, Reingold, and Spielman). Consider rational functions p\(x)/qi(x) and 
P2{y)lqi{y) of degree at most d such that q\ and ^2 are positive on X and Y, respectively, 
and 



sup 

x 



/to- 



Pi to 



91 to 



sup 



P2(y) 



92 (y) 



< 1. 



pi to , Pi{y) 



Then 

/toA^to = sgn{l+/(x)+g(y)} = sgn{l+ , , , 

I 9i W 92 W. 

Multiplying the last expression by the positive quantity ^lto^to gives f(x) A g(y) = 
sgn{q 1 (x)q 2 (y) + pi(x)q 2 (y) + P2(y)qi{x)} ■ □ 

We will also need a converse to Theorem l2.4l proved in [34, Thm. 3.9]. 

THEOREM 2.5 (Sherstov). Let f: X -> {— 1,+1} ant/ g: Y ^ {-l,+l} be given func- 
tions, where X,Y C W are arbitrary finite sets. Assume that f and g are not identically 
false. Let d = deg ± {fAg). Then 

R + {fAd)+R + {g,2d) < 1. 

Symmetric functions. Let S n denote the symmetric group on n elements. For a € S n and 

x G {0, 1}", we denote ax = (x a n\,. . . ,x a r n \) £ {0, 1}". For x £ {0, 1}", we define |jc| = 

x\ +X2 H \-x n . A function <j> : {0, 1}" — > M is called symmetric if <j)(x) = <j>(crx) for every 

x £ {0, 1}" and every a £ S„. Equivalently, is symmetric if <p(x) is uniquely determined 
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by |x|. Symmetric functions on {0, 1}" are intimately related to univariate polynomials, as 
borne out by Minsky and Papert's symmetrization argument \ 

Proposition 2.6 (Minsky and Papert). Let : {0, 1}" 
Then there is a polynomial p £ P c i such that 



E 

aes„ 



(ax)] =p(\x\), 



Rbe a polynomial of degree d. 

jcg{0,1}". 



We will need the following consequence of Minsky and Papert's technique for rational 
functions, pointed out in [34 Prop. 2.7]. 

PROPOSITION 2.7. Let rei,...,re£ be positive integers. Consider a function 
F: {0,1}"' x---x{0, l}"i< suchthatF(x u ...,x k )=f(\ Xl \,...,\x k \)forsome 
f: {0, 1, . . . ,«i} x ■ ■ ■ x {0,1,...,%} — s- { — 1,+1}. Then for all d, 

R + (F,d)=R + (f,d). 

3. Analysis of Random Halfspaces 

In this section, we prove a certain structural property of random halfspaces. Specifically, 
we will fix integers w i , w>2, . . . , w„ at random from a suitable range and analyze the sum 

// 

Y, w > x > 

as x ranges over {0, 1}". Our objective will be to show that, for a typical choice of the 
weights w\ , W2, ■ ■ ■ , w„, the distribution of this sum modulo 2®'"' is highly random. More 
precisely, we will show that the subset X s C {0,1}" that induces any particular sum s 
modulo 2 '") is relatively large and that membership in X s is almost uncorrected with any 
polynomial of low degree. We start with a technical lemma. 

LEMMA 3.1. Let f,g: {0, 1}" — > {0, 1} be given functions. Fix an integer k with ^ k ^ 
re/2. For a set S C {1, 2, . . . ,re}, define F s : {0,1}M{0,1}^ 



F s (x)=f(x)A [g(x)®@ Xi 



Fix a real £ > 0. Then with probability at least 1—2 "+ H ( k / n )»+ 2 £" over a uniformly 
random choice of S € &({\,2, . . . ,«}), one has 



(3.1) 



F S (T)--f(T) 



\T\€k. 



Proof. Define <j>: {0,1}" -> [-1/2,1/2] by <j>(x) = f(x)g(x) - \f{x). Define 5? C 
^({1,2, ...,re}) by & = {S: \<j>(S)\ ^ 2S"- 1 }. By Parseval's identity (TZ2I 



(3.2) 



|^|<4 fn . 



Since F s (x) = \f[x) + (-l)^ Xi (j)(x), we have 



(3.3) 



Fs(T) - ^f(T) 



\HS(BT)\, 



s,rc{i,2,. ..,«}. 
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For a uniformly random S € ^({1,2, . . .,«}), the set {S0 T : \T\ ^ k} contains any 
fixed element of ^({1,2, ... ,n}) with probability 2~" £f =0 (") . By the union bound, we 
infer that 

P[{S®T:\T\^k}ny^0} s? |^|2-"£ C"V 

5 ,=o W 

which in view of dOJ and (El is bounded from above by 2-"+ fl W»)»+2f«. This observa- 
tion, along with J3.3K completes the proof. D 



Using Lemma BTl and induction, we now obtain a key intermediate result. 

Lemma 3.2. Fix an integer k one/ rea/s £,( £ (0, 1 /2). Choose sets So, Si, • • • >Syfc £ 
^({1,2, ... ,n}) uniformly at random. Fix any integer s and define f : {0, 1}™ — > {0, 1 } fey 

k 

(3.4) /W = l E^E*/^ (mod2* +1 ). 

i=0 yes,- 

T/zen w;f« probability at least I — (k+ l)2-" +// ( £ ) n + 2 f" over the choice ofS ,S h . ..,S k , 
one has 

8t,i 



(3.5) 



f(T)- 



2 k+\ 



<2-f", irl < en. 



Proof. In view of the modular counting in (13.4b . one may assume that $C s < 2 k+1 and 
therefore s = Lf = o2'fe, for some bo,b\ , . . . ,bk £ {0, 1}. The proof of the lemma is by in- 
duction on k for a fixed s. 

The base case k = corresponds to f(x) = \ + A(— l)*°>£s (x). One obtains ( 13.51 ) by 
conditioning on the event |So| > en, which in view of ( 12. j} occurs with probability no 
smaller than 1 -2-"+ H ( £ )". 

We now consider the inductive step. Define f : {0, 1}" — > {0, 1} by 

f'(x) = l & £2' £^=£2*6, (mod 2*). 

i=o yes,- i=o 

Let £i be the event, over the choice of So, . . . ,Sfc_i, that |/'(r) — 2 _<: (>r.0| ^ 2 _ ?" for 
m ^ en. By the inductive hypothesis, 

(3.6) P[E l ]^l-k2-" +H ^ n+2l ^ n . 

Let£ 2 be the event, over the choice of So,... ,S*, that \f{T)-\f'{T)\ ^2-£ B-1 for |T| < 
en. In this terminology, it suffices to show that 

(3.7) V[Ei AE 2 ] > 1 - (k+ i)2-"+«( £ )"+ 2 ?". 
Observe that 

/(*)=/'(*) A U(*)®0* ! ), 

where g: {0, 1}" — > {0, 1} is the function such that = 1 if and only if fe^ is the (k+ 1 )st 
least significant bit of the integer EfrJ 2' As a resu lt, Lemma |3~T| shows that 
P[E 2 ] ^ 1 - 2-" +// ( £ ) n + 2 ?". This bound, along with settles (ETJl and thereby com- 

pletes the induction. D 



10 



A. A. SHERSTOV 



We have reached the main result of this section. 

THEOREM 3.3 (Key property of random halfspaces). Fix an integer k^Q and reals e, £ £ 
(0, 1 /2). Choose integers Wi,W2,---,W„ uniformly at random from {0, 1, . . . ,2 k+l — 1}. For 
s £ Z, define f s : {0, 1}" -> {0, 1} by 

n 

(3.8) /*(*) = 1 <=> £wpt; = ,s (mod2* +1 ). 

(=1 

T/zen vw'f/i probability at least 1 - (A: + \)2' n+H ^" +2l ^" +k+l over the choice of 
w>i,W2) • • • ,W n , one has 

s^2~ ?n , |r|<en, seZ. 

Proof. In view of the modular counting in (13.81 >. it suffices to prove the theorem for s £ 
{0, 1, . . . ,2 k+l — 1}. The functions f s have the following equivalent definition: pick sets 
So,Si,. . . ,Sk £ ^({1,2, ... ,n}) uniformly at random and define 

k 

fs(x) = l & E 2< E*^ = * (mod2* +1 ). 
i'=0 jeSj 

The proof is now complete by Lemma [3~2l and the union bound over s. D 



Mt)- 



2 k+\ 



4. Zeroing out Correlations by a Change of Distribution 

Recall the setting of the previous section, where we fixed integers w\ , vt>2, ■ ■ ■ , w n at 
random from a suitable range and analyzed the sum YIi=\ w ' x i as x ranged over {0, 1}". 
We showed that the subset X s C {0, 1}" that induces any particular sum s modulo 2®(") 
is relatively large and that membership in X s has almost zero correlation with any given 
polynomial of low degree. For the purposes of this paper, the correlations with low-degree 
polynomials need to be exactly zero. In this section we show that, with respect to a suitable 
distribution jj, s on each X s , membership in X s will indeed have zero correlation with any 
low-degree polynomial. 

A starting point in our discussion is a general statement on zeroing out the correlations 
of given Boolean functions Xi>X2> ■ ■ ■ iXk with another Boolean function /. Recall that for 
functions f,g: X — > K on a finite set X, we use the notation 

</»s> = i4rE/(*)*to- 

THEOREM 4.1. Let f, X\% ■ ■ ■ ,Xk- X —> {—l,+l} be given functions on a finite setX. Sup- 
pose that 

(4.D EKf>*>l<?' 

k 1 

(4.2) Z\(Xi,Xj)\^^, i = l,2,...,t. 

m 
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Then there exists a probability distribution jl onX such that 

W(x)Xi(x)}=0, i = l,2,...,t. 

Remark 4.2. A comment is in order on the hypothesis of Theorem 14.11 The theorem 
states that if Xi>Xi, ■ ■ ■ ,Xk eacn have a small correlation with / and, in addition, have 
small pairwise correlations, then a distribution exists with respect to which / is completely 
uncorrelated with XuXi, ■ ■ ->Xk- The latter part of the hypothesis, namely the requirement 
(14.21 ) of small pairwise correlations for X\iXi-, • • • iXk, mav seem unnecessary at first. In 
actuality, it is vital. Exponential lower bounds on the weights of linear perceptrons ||27l[38l 
imply, by linear programming duality, the existence of functions f,XiiX2,---,Xk- X — > 
{-1,+1} such that | (/,#,■) | = exp{-0(*)}, i = 1,2,..., k, and yet 



(4.3) 



/(*) = sgnf JjXiXi{x) 



for some fixed reals Ct\ , . . . , 0£#. In this construction, the correlation of / with each Xi is 
small, in fact exponentially smaller than what is assumed in Theorem 14. 11 nevertheless, 
the representation d4.3l ) rules out a distribution jl with respect to which / could have zero 
correlation with each Xi, for such a distribution /I would have to obey 



0<E 



E^c* 

i=l 



/(*) E a iXi(x) 
i=l 



= I««'ElfWZiW]=0. 

1=1 M 



Proof of Theorem H. 11 Consider the linear system 
(4.4) Ma = y 

in the unknown a € R*, where M = \{XuXi)\i,i i s a matrix of order k and y = 
{{f,Xi)i ■ ■ ■ > (fiXk)) G R fc - Then ( 14.2b shows that M is strictly diagonally dominant and 
hence nonsingular by Theorem l2.ll Fix the unique solution a to the system (14.41 ). Then 
2 I«/| -l!j=i\<Xj(Xi,Xj)\ < K/)I(')I for i = 1,2,...,*. Summing these inequalities, we 
obtain 

2t\ai\-t\ a j\h(Xi,Xj)\<t\(f>Xi)\, 

(=1 y=l i=l i=l 

which in view of ( 14. It and d4.2| > shows that £* = 1 1 a,-| < 1. Therefore, the function : X -> R 
given by 



MW = e ( 1 -/WE^'Wj 
is a probability distribution on X for a suitable normalizing factor e > 0. At last, 



E[f(x) X i(x)] = e\X\ \^f,Xi)-^ccj{xuXj)j =0, 



where the final equality holds by ( 14.41 ). 



We are now in a position to prove the main result of this section. 
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THEOREM 4.3. Let Ct > be a sufficiently small absolute constant. Choose integers 
H>l,H>2, ... ,W n uniformly at random from {0, 1, . . . ) 2L a "J+ 1 — 1}. For s G Z, define 

(4.5) X s = jx G {0, 1}" : £ Wi x, = s (mod 2^+1) j . 

Tnen w/?n probability at least 1 — e~"/ 3 over the choice of w\ , W2 , • • ■ , w„ , there is a distri- 
bution )J, S on X s (for each s) such that 

(4.6) E[p(*)]=E[p(je)] 

fis Hi 

for any j,(gZ ant/ any polynomial p of degree at most \jxn\ . 

Proof. Let a > be sufficiently small. We will assume throughout the proof that n 1/a, 
the theorem being trivial otherwise. Set £ = 2a, £ = 1/5, and k = [an\ in Theorem l3.3l 
Then with probability at least 1 — e - "/ 3 over the choice of w\ , W2 , ■ ■ ■ , w n , one has 



(4.7) 



^2-"/ 5 , |r|<2an, s€Z, 



2\_an\+\ 

where / f : {0, 1}" — > {0, 1} is given by f s (x) = l4$-x£ X s . It follows that for each s, 

(4.8) \X S \ =2 n f s {0) ^2"(2-L«»J-i_ 2 -"/ 5 ). 

For /,*: {0,1}" -> K, we will write (f,g) Xs = \X s \- l Z-<exJ(x)g(x). Let .V C 
2, ...,«}) be the system of nonempty subsets of at most an elements. Fix any 
T G Then for each 5, 



(4.9) £ Kfexrkl = 1^7 1 l/*(ser)| ^ §- ■ \y\i- n ' 5 < \, 

where the final two inequalities follow from ( 12. 11 1. ( 14.7b . and ( 14.8b . Similarly, for each 5, 

(4.10) lK/.,zs>*l = j^7 II/,WK^-l^|2- n/5 <i 

In view of (|4.91 l and ( 14.101 ). Theorem l4.1l provides a distribution ji s on {0, 1}" that is sup- 
ported on X s and obeys ju s (5) = for S G J^. Since fi s is a probability distribution, we 
additionally have p, s (0) — 2~" for all s. In particular, the distributions fi s have identical 
Fourier spectra up to coefficients of order an, which is another way of stating ( 14.61 ). D 



5. Reduction to a Univariate Problem 

Recall from the Introduction that the crux of our proof is to establish the existence 
of a halfspace /: {0, 1}" — > {— 1,+1} that requires a rational function of degree ©(n) for 
pointwise approximation within 1/3. The purpose of this section is to reduce this task, for a 
suitably chosen random halfspace, to a univariate problem. The univariate problem pertains 
to the uniform approximation of the sign function on the set {±1 , ±2, ±3, . . . , ±2 0( -"'} and 
has been solved in previous work. Key to this univariate reduction will be the construction 
of probability distributions in the previous two sections. 
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Theorem 5.1 (Reduction to a univariate problem). Put k = \ an\ , where a > is the 
absolute constant from Theorem 14.31 Choose w\,w2, - ■ ■ ,w„ uniformly at random from 
{0,l,...,2 fc+1 -1}. Define /: {0,1}" x {0,1,2,...,*} ->• {-1,+1} by 



f(x,t) =sg 



nfi + £w^-2^. 



77zen wifn probability at least 1 — e "/ 3 over fne choice of w\ , W2 , . . . , w n , one now 

(5.1) /?+(/,</) ^/f+({±l,±2,±3,.. .,±2*}, rf), rf = 0,l,...,&. 

Proo/ For s = ±1, ±2, ±3, ... , ±2*, define X s C {0, 1 }" by (03). Then by Theorem|43] 
with probability at least 1 — e~"/ 3 there is a distribution jU s on X s for each s such that 

(5.2) E[p(jt)]=E[p(jt)] 

Ms Mr 

for any s,r£ {±1 , ±2, ±3, . . . , ±2^} and any polynomial /? of degree no greater than k. In 
the remainder of the proof, we will work with a fixed choice of weights w\, W2, ■ ■ ■ , w n for 
which the described distributions fi s exist. 

Suppose that R + (/, d) < e where < £ < 1 and d ^ k. Then there are degree-of 
polynomials p,^ont"xK such that on the domain of /, 

(5.3) < (l-e)q(x,t) < p(x,t)f(x,t) < (l+e)q(x,t). 
On the support of jx s (for s = ± 1 , ±2, ±3 , . . . , ±2*), the linear form 

obeys l(x,s) G {0, 1,2,... ,n} and f(x,£(x,s)) — sgns. Letting t — £(x,s) in ( 15.31) and pass- 
ing to expectations, 

< E [q(x,£(x,s))](l-e) < E sgns 

< E [q(x,£(x,s))](l+e). 

It follows from (15.2b that E^ 5 [p(x,^(x,i))] = P(i) and E^[q(x,£(x,s))] = Q(s) for some 
P,Q G ^/ and all 5. As a result, P + ({±l,±2,±3,...,±2 <r },<f) < e, the approximant in 
question being P/Q. D 

It remains to rewrite the previous theorem in terms of functions on the hypercube 
{0, l} 2 " rather than the set {0, 1}" x {0, 1,2, ... ,«}. 



THEOREM 5.2. Put k = [(Xn\, where a > is the absolute constant from Theorem \4.3\ 
Choose wi, W2, ■ ■ ■ , w n uniformly at random from {0, 1, . . . 7 2 k+l — 1}. Define f: {0, l} 2 " — > 
{-l,+l}by 

Ci n 2n \ 

+ £w^-2 fc+1 £ x,- . 
7 i=l i=n+l / 

TTjen vw'fn probability at least 1 — e~"/ 3 over fne choice ofw\,W2, . . . ,w„, one nos 

P+(/,c/) ^P+({±l,±2,±3,...,±2 /: },t/), rf = 0,l,. 

Proof. Immediate from Proposition ^. 7| and Theorem |5.1| D 
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6. Main Result and Generalizations 

We now combine the newly obtained result on rational approximation with known re- 
sults from Section|2]to prove the main theorem of this work. 

THEOREM 6.1 (Main result). Fix sufficiently small absolute constants a > and j3 = 
j3 (cc) > 0. Choose integers W\, W2, ■ ■ ■ , W n £ {0, 1 , . . . , 2 L a "J + 1 — 1} uniformly at random. 
Then with probability at least 1 — e~"/ 3 , the function f: {0, l} 2 " — > {—1, +1} given by 

(1 n 2n \ 

L i=\ i=n+\ J 

obeys 

(6.1) deg ± (/A/)> Lj3«J. 

Proof. Theorem 15.21 shows that with probability at least 1 — e~"/ 3 over the choice of 
Wi,W2,---,w„, one has 

(6.2) R + {f,d)^R+(S,d), d = 0,l,...,|anj ) 

where S = {±1, ±2, ±3, . . . ,±2L a "J } and a > is the absolute constant from Theorem l4.3l 
In the remainder of the proof, we will condition on this event. 

Suppose now that deg ± (/ A/) < [fin\ , where [5 is a constant to be chosen later subject 
to < j3 < a/4. Then Theorem [231 implies that R + (f, |4/3«J ) < 1/2, which in view of 
d6.2l > leads to R + (S, [4j3nJ) < 1/2. The last inequality violates Theorem 12.21 for small 
enough /3 > 0. Thus, ( 16.11 ) holds for j3 small enough. D 

Recall that the technical crux of this paper is an optimal lower bound for the rational 
approximation of a halfspace. We will have occasion to appeal to this result again, and for 
this reason we formulate it as a theorem in its own right. 

Theorem 6.2. A family of half spaces h n : {0, 1}"— >{ — l,+l},n = l,2,3,..., exists such 
that 

(6.3) tf + (fc„,d) = l-exp{-©Q)}, d= 1,2,. ..,©(«). 

Proof. The lower bound in ( 16.31 is immediate from Theorem l5.2l and the univariate lower 
bounds in Theorem l2.2l 

Next, every halfspace h n : {0,1}" — > {— 1,+1} constructed in Theorem 15.21 trivially 
obeys R + (h n , 1) < 1 — exp{— ©(«)}. For < 4 < 1, Newman's classical work [28] shows 
that /? + ([— 1, — 4] U \\,d) ^ 1 — <^ ( 1 / c/ ) j whence by composition of the approximants 
one obtains the upper bound in ( 16.31 . D 

Mixed intersection. Theorem 16.11 shows that the intersection of two halfspaces has the 
asymptotically highest threshold degree. At the same time, Beigel et al. [6| showed that 
the intersection of a constant number of majority functions on {0, 1}", which are partic- 
ularly simple halfspaces, has threshold degree (9(log«). We now derive a lower bound of 
£2(^/«log«) on the threshold degree of the intersection of a halfspace and a majority func- 
tion, which improves quadratically on the previous bound in |34| and essentially matches 
the upper bound, (3 (-^n log 71), given below in Remark |6~4l 



SIGN-REPRESENTING THE INTERSECTION OF TWO HALFSPACES BY POLYNOMIALS 



15 



THEOREM 6.3. A family of half spaces h„ : {0, 1}"— >{ — l,+l},n = 1,2,3,..., exists such 
that 

(6.4) deg ± (/i„ AMAJ„) = &(y/n\ogn). 

Proof. The lower bound in (16.41 is immediate from Theorems 12 . 3 1 12 . 5l and !6.2l The upper 
bound in ( 16.4b is immediate from Theorems 12 . 3 1 12 .41 and 16. 21 D 

Remark 6.4. The construction of Theorem |6.3| is essentially best possible in that every 
sequence of halfspaces h„: {0, 1}" — > {— 1, +1}, n = 1,2,3, . . . , obeys 

(6.5) deg ± (/?„ AMAJ„) = 0{y/n\ogn). 

To derive this upper bound, recall that R + (h„,l) < 1 — exp{— ©(nlogn)} for every 
halfspace h„: {0,1}" — » {— 1,+1}, by a classical result due to Muroga 11261 . Since 
ff + ([-l,-€]U[€,l],rf) < 1 -^®( 1 / c/ )for0<^ < 1 by Newman GS], we obtain by com- 
position of approximants that R + (h n ,d) < 1 — exp{— &({nlogn}/d)}. This settles (16.5b in 
view of Theorems l2.3l and l2.4l 



Threshold density. In addition to threshold degree, several other complexity measures 
are of interest when sign-representing Boolean functions by real polynomials. One such 
complexity measure is density, i.e., the least k for which a given function can be sign- 
represented by a linear combination of k parity functions. Formally, for a given function 
/: {0, 1}" — > { — 1,+1}, the threshold density dns(/) is the minimum size \S^\ ofafamily 
yc &>({l,2,...,n}) such that 

f(x) = sgn £ hXs{x) 

\se.y 

for some reals Xs, S EJf.lt is clear from the definition that dns(/) ^ 2" for all functions 
/: {0, 1}"— >{ — 1,+1}, and we will show that the intersection of two halfspaces on {0, 1}" 
has threshold density 2®M . 

To this end, we recall an elegant technique for converting Boolean functions with high 
threshold degree into Boolean functions with high threshold density, due to Krause and 
Pudlak [21 Prop. 2.1]. Their construction sends a function /: {0, 1}" — > {— 1,+1} to the 
function : ({0, 1 }") 3 -> {- 1 , +1 } given by 

/^(Jt.y.z) =/(..., (TiAxt) V ( Zi Ay/), . . . ). 

THEOREM 6.5 (Krause and Pudlak). Every function f: {0, 1}" -> {— 1,+1} obeys 

dns(/ KP ) ^2 de ^ f) . 

We are now in a position to obtain the desired density results. 

THEOREM 6.6. A family of halfspaces h„ : {0, 1}"— >{ — l,+l},n = 1,2,3,..., exists such 
that 

(6.6) dns(/i„ Ah„) ^ exp{ ©(«)}, 

(6.7) dns(/z„ A MAJ„) ^ exp{0(^nlogn)}. 
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Proof. The parity of several parity functions is another parity function. As a result, 
(6.8) max{dns(/z„ Ah„)} ^ max{dns(F AF)}, 

where the maximum on the left is over all halfspaces h„: {0, 1}" — > {— 1,+1} and the 
maximum on the right is over arbitrary functions F : {0, 1}'" — > {— 1,+1} (for arbitrary 
to) such that dns(F) ^ n. For each n = 1,2,3,..., Theorem [6j] ensures the existence of 
a halfspace /„: {0, 1}" -> {-1,+1} with deg ± (/„ A /„) > Q(n). By Theorem O the 
function (/„ A/„) KP = f,^ A/,, 1 ^ has threshold density exp{H(n)}. Since dns(/„ KP ) 
4«+ 1, the right member of ( 16.8b is at least exp{fi(n)}. 

This completes the proof of ( 16.61 ). The proof of ( 16.71 ) is closely analogous, with Theo- 
rem |6.3| used instead of Theorem |6.1| D 

The lower bounds in Theorem 16.61 are essentially optimal. Specifically, ( 16.6b is tight 
for trivial reasons, whereas the lower bound ( 16.7b nearly matches the upper bound of 
exp{©( v / nlog 2 n)} that follows from ( 16.5b . 

We also note that Theorem 16 . 5 1 readily generalizes to linear combinations of conjunc- 
tions rather than parity functions. In other words, if a function /: {0, 1}" — > { — 1,+1} 
has threshold degree d and / kp (jc,)' ) z) = sgn(J^.j Aj7J(je,y,z)) for some conjunctions 

T U ...,T N of the literals Xi,yi,Zi,... ,x„,y„,z n , X\, V|. "i v„. y„. -„. then N ^ 

2 a ( rf ) . With this remark in mind, Theorem |6.6| and its proof readily carry over to this alter- 
nate definition of density. 
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